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Supplementary Figure 1 : Recombination rate inference from PSMC 

MSMC for two haplotypes is a special case that we call PSMC, in contrast to PSMC 
because we use SMC' (see Supplementary Note) as underlying coalescent model. Here 
we show the iterative estimation of the recombination rate for two demographic scenarios: 
i) a constant population size and ii) a bottleneck in the past. As can be seen, in both cases 
the estimated recombination rate converges quickly to the true value with very high 
accuracy. 
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Supplementary Figure 2: Simulations of other scenarios 

See Supplementary Note for details on these additional simulations, (a) MSMC results 
from simulated data which represents a much simplified CEU population size history with 
sharp changes, (b) Similar to a) but with a simplified YRI-like history, (c) A population split 
with subsequent migration, (d) A population split with subsequent population size changes, 
(e) Inference from 8 and 16 haplotypes. For 16 haplotypes, we needed to reduce the 
computational complexity by reducing the simulated sequence to 1Gb instead of 3 Gb, and 
used a coarse-grained set of parameters with 20 time intervals instead of 40. 
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Supplementary Figure 3: Testing Singleton Branch Length Estimates 

Here we compare MSMC estimates based on the estimates of T s obtained via the HMM 
described in Supplementary Note (solid), section 7 with the true values of the singleton 
branch length as output in the simulation (dotted), (a) shows population size estimates, (b) 
shows split estimates. 
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Supplementary Figure 4: Simulations with recombination hotspots 

To assess the effect of heterogeneous recombination rates across the genome, we 
simulated 100 chromosomes with 4 haplotypes of 1Mb each, (see Supplementary Note). 
For practical reasons we could not scale up this simulation to 3Gb of total sequence or to 
more haplotypes. We used as input random chunks of the real human recombination map 
from the HapMap project, (a) This plot shows the effective population size estimates from 
both the standard simulation and simulations with the human recombination map. We see 
only small effects of variable recombination rates mostly in the two extreme ends of the 
estimated time interval. Some of that difference may also be caused by the much smaller 
total sequence length of the hotspot simulation in comparison to the standard simulation, 
(b) Here we show the two split scenarios at 10kya and 100kya. Again, the differences 
between the hotspot simulation and the standard simulation are only small. 
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Supplementary Figure 5: Application to unphased data 

We generated datasets in which we deliberately "unphased" one or both diploid genomes 
in a setting of four haplotypes. In (a) we plot the population size estimates from two diploid 
individuals of which both are phased (red), one is unphased (blue) and both unphased 
(purple). In (b) we plot the relative cross coalescence rate estimates based on similarly 
unphased data. 
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Supplementary Figure 6: Comparison of trio- vs. population-phasing and effect of 
unphased sites 

We tested, whether results from trio-phased data differs from population-phased phasing. 
We checked this for CEU and YRI, for which we have trio-sequences available. As shown 
in (a) and (b), the trio-phased results do not differ strongly from the population phased 
results (solid vs. dashed lines). When population-phasing our sequences, there are rare 
sites which are not present in the reference data set and are therefore not phased. There 
are two possibilities: i) leave them in as unphased sites ("all"), ii) remove them from the 
analysis ("restricted"). The two cases are shown in a) and b) as solid vs. dotted lines. As 
shown, for population size inference, removing unphased sites does not appear to improve 
estimates (in comparison to the trio-phased estimates), but for the population separation 
analysis, removing unphased sites gives smoother estimates in the most recent times and 
removes some non-monotonic artifacts (in CHB/GIH). 
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Supplementary Figure 7: Comparisons of population size estimates with two, four 
and eight haplotypes 

(a) We show the estimates based on four haplotypes (solid lines) together with estimates 
from two haplotypes (dotted lines). For clarity, we separated the curves based on African 
and Non-African samples, (b) This plot shows estimates based on eight haplotypes (thick 
lines) in comparison with the estimates based on four haplotypes (thin lines). 
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Supplementary Figure 8: Replicate analysis with four haplotypes 

We generated a replicate set of population size and relative cross coalescence rate 
estimates, based on the two individuals in each population not used for the main analysis, 
as presented in Figures 3 and 4. In both figures, the replicate estimate is shown as dashed 
line, the original estimate as solid line. For clarity, African and Non-African estimates are 
separated. 
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Supplementary Figure 9: Comparison of relative cross coalescence rate estimates 
with four and eight haplotypes 

Here we show relative cross coalescence rate estimates based on eight haplotypes (four 
haplotypes from each population, in solid lines), with estimates from four haplotypes (two 
haplotypes from each population, in dotted lines). 
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Supplementary Figure 10: Comparison with diCal 

A different method to estimate historical population sizes from multiple phased haplotypes 
was recently implemented in the software diCal(see Supplementary Note). Here we have 
applied diCal to 8 haplotypes, each 10Mb long, simulated using the zig-zag population 
size history as in Figure 2. The relatively short length of 10Mb is the same length as used 
in Sheehan et al. 2013; with the current diCal implementation, analysis of larger data sets 
is not practical. We tested three different time intervals, using the parameter "-t", which 
sets the left boundary of the last time interval in scaled units. With the "-t 1" option in the 
plot below, diCal obtains correct estimates between 20kya and 200kya (red curve), roughly 
the same period addressed by MSMC with 2 haplotypes. To explore more recent times 
that we access with 4 or 8 haplotypes we tried to change the default time interval by using 
lower "-t" values (see Supplementary Note for details), but the resulting population size 
estimates were not so good (purple and blue lines). The method may be able to perform 
better if a more efficient implementation allows it to run on whole-genome sized datasets. 
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Supplementary Table 1 



Sample names and 


oooulations 


Sample ID 


Population 


NA18526 


CHB 


NA18537 


CHB 


NA18555 


CHB 


NA18558 


CHB 


NA20845 


GIH 


NA20846 


GIH 


NA20847 


GIH 


NA20850 


GIH 


NA18940 


JPT 


NA18942 


JPT 


NA18947 


JPT 


NA18956 


JPT 


NA19017 


LWK 


NA19020 


LWK 


NA19025 


LWK 


NA19026 


LWK 


NA21732 


MKK 


NA21733 


MKK 


NA21737 


MKK 


NA21767 


MKK 



Sample ID 


Population 


NA19735 


MXL 


NA19649 


MXL 


NA19669 


MXL 


NA19670 


MXL 


NA20502 


TSI 


NA20509 


TSI 


NA20510 


TSI 


NA20511 


TSI 


NA19238 


YRI 


NA19239 


YRI 


NA19240* 


YRI 


NA18501 


YRI 


NA18502 


YRI 


NA12878* 


CEU 


NA12891 


CEU 


NA12892 


CEU 


NA06985 


CEU 


NA06994 


CEU 
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Supplementary Table 2 

D statistic for historic gene flow from CHB to GIH 




A 


B 


B 


A 


253926 


0.0729 


>549 


B 


A 


B 


A 


219404 






CHB 


GIH 


CEU 


YRI 


# sites 






A 


B 


B 


A 


291871 


0.1417 


>2240 


R 


A 


B 


A 


?1 Q404 










MXL (only 
Nat. Am.) 


YRI 
i n i 


it citpQ 






A 


B 


B 


A 


159754 


0.0454 


> 138 


B 


A 


B 


A 


145869 






MXL (only 
Nat. Am.) 




YRI 


# sites 






A 


B 


B 


A 


194481 


0.1428 


> 1514 


B 


A 


B 


A 


145869 
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Supplementary Table 3 

Sample Population Statistics 



Populati 
on 


# called sites 


# segregating 
sites 


# unphased 


% unphased 


Theta 


CEU 


2,132,078,340 


3,973,757 


124,972 


0.031449331 


7.19E-04 


CHB 


2,132,530,333 


3,734,161 


148,046 


0.039646389 


6.75E-04 


GIH 


2,137,960,203 


4,180,134 


279,164 


0.066783505 


7.54E-04 


JPT 


2,128,798,051 


3,689,030 


106,160 


0.028777212 


6.68E-04 


LWK 


2,110,232,589 


5,498,240 


225,690 


0.041047681 


1 .00E-03 


MKK 


2,132,615,842 


5,199,644 


334,893 


0.064406909 


9.40E-04 


MXL 


2,113,376,228 


3,958,912 


131,396 


0.033189927 


7.22E-04 


TSI 


2,133,276,458 


4,031,944 


124,492 


0.030876421 


7.29E-04 


YRI 


2,133,075,233 


5,646,792 


197,841 


0.035035999 


1 .02E-03 
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Supplementary Table 4 



Inbreeding and cryptic relatedness 



Pnm il sit inn 


within samples tt w 


samples n c 


TT /tT.. 

1 l(y 1 lw 


CEU 


7.40E-04 


7.39E-04 


0.998 


CHB 


6.98E-04 


6.96E-04 


0.998 


GIH 


7.65E-04 


7.68E-04 


1 .004 


JPT 


6.97E-04 


6.91 E-04 


0.992 


1 lAfIX 

LWK 


n ml — r\A 

9.5ob-04 


9.57b-04 


1 .000 


i\ /i ix ix mm 

mkk all 


9.b0b-04 


n one o /i 
9.^b-04 


0.9b0 


l\ fl IX IX / n „ 1 , , 

MKK (only 
NA21732 and 
NA21 737) 


9.5bb-04 


7.^bb-04 


0.7b0 


MXL 


7.14E-04 


7.36E-04 


1.030 


TSI 


7.42E-04 


7.45E-04 


1.003 


YRI 


9.85E-04 


9.83E-04 


0.998 
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Supplementary Note 

The Multiple Sequentially Markovian Coalescent (MSMC) 



Stephan Schiffels and Richard Durbin 

Wellcome Trust Sanger Institute 

1. MSMC Continuous Time Transition Probability 

Each state of MSMC is defined by a triple (/, j, t) with i < j, where i and j denote the indices of the two samples that coalesce first, and 
/ is the time of first coalescence. The condition < j comes from the fact that we consider all unordered pairs of two individuals, of 
I M\ 

which there are I I in total, where M is the number of haplotypes. We consider the conditional transition probability to switch from 
the state (k, I, s) to (/, j, t), given a recombination event at time u < s in leaf-branch m. 

We distinguish between three types of transitions, determined by how the first coalescence time changes or not: t < s, t = s, and t > s. 
The three cases are detailed in the following. 

■ Definitions 

We denote the scaled rate of coalescence between branch i and j over time by \'J(t) and always assume symmetric rates 
\'J(t) = A J, '(t).We define the following convenient marginal rates: the total coalescence rate of branch i to all branches (including itself): 

M 

and the overall total coalescence rate of any two branches: 

M-l M 

We also define these exponentiated integrals: 
L m (h\h) = exp|-jTA m (v)rf> 



and 



L(ti ; t 2 ) = expi 



(-r 



A(v)dv 



■ Equilibrium probability 

The equilibrium probability is then given as 

q 0 (t,i,j) = X i J(t)L(0-t), 

which is normalized, as shown in section 11. 

■ Transitions with t < s 



X 



f?i 



As shown in the picture, for this transition to occur we need to have m e ((', j) and the resulting floating branch coalescence needs to 
take place between i and j at time t and it must not coalesce with any other branch, including itself before t. This can be summarized as: 
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q(i, j, t\k,l, s, u, m) t<s = 



\<J(t)L m (u;t) if me{i,j] and u < t. 



0 



else 



■ Transitions with t = s 



n 




n p 



For this to happen, one of two things must occur: Either, any branch with m $ {k, 1} does coalesce later than t, or second, any floating 
branch self-coalesces before time / (see Figure). This results in the following terms: 

L m (u\ t) if m±k and m± I 
else 



q(i, j, t\k, I, s, u, m) r=s . = P X m/n (jc) L"'(u; k) <Jk + j L 



■ Transitions with t > s 



k l,m 



Here we require m e {k, l\, i.e. the recombination needs to break the existing pair of first coalescence. A floating branch is then created 
and it must not coalesce back with any branch (including itself) before time s. It is not necessary that this particular floating branch will 
coalesce at time t, since any two other branches could coalesce then and hence make up the new pair (i, j). We therefore require that no 
pair coalesces before time t, resulting in the term: 



q(i, j, t\k,l, s, u, m) t>s = 



\'-J(t) L m (u- s) L(s- 1) if m e {*, /} 



0 



else. 



■ Full transition probability 

The three cases above can be summarized as: 
q(i, j, t\k, 1, s, u, m) = 



8(t - s) fa <5, 7 ( JV"V) L m (u; k) dK + {\- S„ a ) (l - S mJ ) L m (u; ?)) + 



(<V; + S mj) *- tJ (t) L m (u; t) ®(t -u) for t < s 
{S„a + S„,.i) K J (t) L m (u; s) Us 1 , t) for t > s. 
We show in section 1 1 that this conditional probability is normalized, i.e. that 

M 

y I q(i, j, t \k, I, s, u,m) dt = 1, 

i<j 0 

for fixed k, I, s, u and m. 

The full transition probability is then a sum over both cases with and without recombination, integrating over the parameters s and m 
with uniform probability density: 

1 1 

q(i, j,t\k,l,s) = e- Mrs S(t-s)S ik 8 n + {\-e- Mrs ) ) q(i, j, t\k, I, s, u, m) du 

s M Jo ~i 



which results in the expression: 

q(i, j, t\k,l,s) = 8(t - s) S itk djjqdk, I, s) + q 2 (i, j, t\k,l, s) 
with 

J M 



qi(k,l,s): 
and 



1 1 rt " rt M 

n + (i - e - M ") — j YjJ A '"' ra ^) L ™(" ; *) dK + Yj LW( " ; 



d u 



q 2 (i, j, t\k,l, s) = (1 - e 



' ** r ,\ rm/ 



if t < s 



s M 



\ L(s;t) £Y m={kJ) L'"(u;s)du if t>s. 



(1) 



(2) 



(3) 
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Note that because of the normalization we always have 

M po 

qiik, I, s) = 1 - V I q 2 (i, j, t\k,l, s)dt. 

It will therefore suffice to constrain all further derivations to q 2 (i, j, t\k, I, s) and omit explicit expressions for q { (k, I, s). 



(4) 



Note that the special case M = 2 does not reduce to the original PSMC model as derived in [1]. Instead it differs from PSMC in a subtle 
way, because here we explicitly include the possibility that the floating branch coalesces back onto itself. This modification corre- 
sponds to the introduction of SMC [2] after SMC [3]. 

■ Piecewise constant coalescence rate 

We introduce discrete ordered time boundaries T a for a = 0 ... n with Tq = 0 and T„ = oo. We define piecewise constant population 
sizes which correspond to piecewise constant coalescence rates: 



X'- J (t) = XJforT a <t<T a+l . 

We define the discretized marginal rates 

M-l M 



(5) 



A«=ZZ^ 



=i j=i+i 



and 



a: 



a ~ /_l a 



In practice, we use quantile boundaries of the exponential distribution with mean 



(in time units of 2 N 0 generations): 



r «=- iog K) 



a \ KM 
2 )' 



(6) 



Let the next lower time boundary from /] be yS, and the next lower time boundary from t 2 be a. We use the shortcut A ff = T a+ i - T a . 
The exponentiated integrals L(ti; t 2 ) and L m (t\ \ t 2 ) are then expressed as: 



( 



Utuh) | a+/J = exp 



K=fS+l 



Ut\\h) \ a=j3 = exp(-(J 2 - h) A ff ). 

and similar formulas for L m (t\\ t 2 ), replacing A a with A™. 

With these expressions, we can compute all the integrals in the transition probability: 

1 1 



q 2 (i, j, t;a\k, I, s; yS) | (<J = (1 - e 



s M 



1 1 • r 
= fl- e *')--tf^ ' 



= (1 
= (1 



s M 

1 1 

s M 



$(t) \ V L m (u;t)du 
Jo ^ J ., 

Yj\ + Z L m (u;t)du + I Yj L m (u;t)du 

y=0 r m={i,j) T "m={i,j) 

A «jZ Z L m (T 7+l -t)£"'e- (T ^ A '"du+ Y £e- ( '- u)A "du 



1 1 

s M 



7=0 m=[i,j) 
a-l J 



m=\i,j) 
1 



A™ 



and 



1 1 rs _ 

? 2 («. y. f; a I *, /, j; /J) U s = (l - e~ Mrs ) A' -'(t) L{s\ t) > L m (u;s)du 

s M Jo z -'„ 



m={k)) 



= (1 



1 1 

s M 



Z Z f ' ' Lm ( u > s ) du + Z f L m (u;s)du 



\y=0m={k,l) 7 



m={k)) 



(V) 
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= (1 
= (1 
= (1 



11 . . 
■e- Mrs )-—X , 0 J L(s;t) 
s M 



1 1 



s M 



Y y f ' ' Lm (^ s ) du + Y f Lm< - u ' s ') du 

7=0 m={kfi T y m={ki\ T " 

Y Y L m (T T+l ; S )J^'e- (T ^-" )A "du+ Y £e-™*du 



1 1 

s M 



7=0 m={k,l) 



m={kf\ " 



{y=0 m={k,l) 7 m={kj) /3 



■(s-?»AJ 



2. MSMC Discrete Time Transition Probability 



We first compute the total weight of the equilibrium probability over each time interval: 

X a ' L(0;t)dt = $ L(0-T a ) e- ( '- T - )A "dt=—L(0-T a )(l-e- A - A -) 
t„ Jt„ A a 

For each time interval yS we now compute the average coalescence time: 



1 f7>+i 

t q 0 (t\ P) dt = 



A« 



L(0; 7^(1 -tT'W) A, 



tL(0;t)dt 

JTa 



1 - e -^h\ Jt,, 



(l + A^-e-^i + A^r^)) 



Note that this expression for (tpj has a numerical instability for Ap s 10 3 . We set the following asymptotic values: 
(Tp + Tp +1 ) 1 2 for Ap < 10~ 3 and /3 + 1 < co 



T„ + (Aa) for A„ < 1(T 3 andyS + 1 = oo . 



(9) 



To get discrete transition probabilities, we integrate the transition probability for each time interval a through [T a ; T a+1 ], replacing the 
time s with (tp)\ 

q(i, j, a\k,l,0) = q(i, j, t\k,l, {tp))dt 

■ First case: t<s 

q(i, j, a\k,l, P) | 0</3 



(l-e- M 'W)- 



; ; "fz Z ^ I 1 - ^ A ">iv i; o + Z i (i - : 



" Aa 



»« 

m=(< J) a: 



(1 _ e ~Mr(,^ 



1 1 



t*) M 



Z Z ^-e-^MT J+1 ;T a ) J^V™^ + ^ U*.- *) 
\y=0 m={i,j\ A y „,=[,- ;i A„ v */r„ 



m=(i Jl " 



= ( 1 _ e -Mr<^__ A V 

■ Second case: t>s 

j, or | k, I, P) \ a> e 



Z Z ^(i-^Mr^rJ^ti-^). X -LL-±(i-^*)) 

7=0m=(iV) "7 il a m=(ij| yl a V " a ^ 



1 1 1 1 

e -Mr W] x , L((t ^ t) ^ ^ (l _ e -A^») Ll r r+i; ^}) + ^ (l _ e «,Ar 



A™ 

1,7=0 m=(t,/} J1 7 



m=\k)\ 



A"' 



dt 
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1 1 



M 



L{( t/3 );T a ) 



'M i 

2 2 A m 



A"' 



-l(tf)-T f )Af • 



3. MSMC Emission Probability 



An observation consists of an allele string O with alleles denoted as O' e {1, 0). Real allele strings with letters (A, C, T, G\ can be 
normalized to (1, 0), approximating rare multi-allelic SNPs by merging of two arbitrary alleles. 

■ Singleton branch length 

To define the emission probabilities, we need to define the singleton branch length, T s , a property of the tree that can not be expressed 
through the hidden state alone. As shown in the following picture, the singleton branch length of the tree is the sum of all those 
branches which would give rise to singleton mutations, should a mutation occur on that branch. Note that this definition is defined for 
minor allele counts. This means that T s includes those branches of the tree which result in a derived allele count of M - 1 alleles. The 
singleton branch length can be estimated locally from real data in an approximately independent analysis from MSMC itself, see 
section 6. For the following, we assume that we have an estimate of T s at every position along the samples. 



Singleton branch length Ts 



First coalescence time t { 



Note that the singleton branch length is not strictly independent from the first coalescence time. For MSMC we ignore this interdepen- 
dency, as justified from simulations, see main text and Supplementary Figure 3. Instead, we treat the local estimate T s as a soft 
constraint. This means that in the emission probabilities below we always consider max(T s , M t) when we write T s . This means that we 
always set T s = M t when we compute a term with M t>T s . This is an evident requirement, as seen in the picture above: The sum of all 
blue branches must be at least as long as M times the first coalescence time /. 

■ Emission categories 

For more than two haplotypes, i.e. M > 2, the possible observations given state {t, i, j) fall into the following categories, corresponding 
to the list in Online Methods: 

1. All alleles are the same, no mutation occured within the singleton branches nor in the rest of the tree, which happens with 
probability 

e(0\t, i,f) = \- l iT s . 

In this formula, we ignore the rest of the tree outside of T s : since we only know the average branch length of the tree, this would 
result simply in a constant factor, independent of the state (t, i, j) at that position, and hence does not affect the normalized 
posterior probability across states. 

2. The two alleles in samples and j differ, O' + 0 J , and this is the only difference between any two samples. In this case, a mutation 
occurred in one of the leaf branches i or j, resulting in the emission probability 

e(0\t,i,f)=ixt 

3. Both leaves i and j have the same allele, O' = 0 J , and exactly one of the other alleles differs from O' and OK In that case, a 
mutation occurred somewhere within the singleton branch length T s of the tree, but outside the two leaf branches leading to ( and j. 
Because this includes mutations with resulting allele count M - 1, as discussed above, the emission probability consists of two 
terms. First, the probability that a mutation occurs within T s but somewhere higher up the tree than time t is fi(T s - M t). To turn 
this into the probability to observe a specific singleton, this needs to be normalized by the number of possible observations with 
minor allele count 1 outside the pair of first coalescence, which is M - 2. Second, a mutation may have occured more recently than 
/ in the leaf branch carrying the singleton allele, which happens with probability \i t: 

H (T s -Mi) 

e(0 1 1, i, j) = +nt. 

M -2 

4. A higher frequency variant with minor allele count larger than 1 occured, but we have O' = O' . This means that no mutation 
occurred anywhere within T s so this results in the same probability as for category 1 : 

e{0\t,i,f) = \- l iT s . 

This ignores any terms from mutations higher up the tree, with the same argument as in category 1 above. For the same reason we 
do not normalize this term by the number of possible observations, as is necessary in the first term of category 2. 

5. A higher frequency variant with minor allele count larger than 1 occured, but we have O' O' . This requires at least two 
mutations, one within the pair of first coalescence and one outside, which we assume to occur with zero probability: 
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e(O\t,i,j)=0 

6. Some samples have no alleles called at a position. As in many other HMM implementations, we model missing data by setting the 
emission probability to 1 for all states: 

e(0\t,i,j) = \. 

4. MSMC Hidden Markov Model 

We can now define a Hidden Markov Model, using the above defined transition and emission probabilities. For a given sequence of 
length L, we define the observations as 0\ ... 0 L . 

We define a forward variable f\(ct, i, j) . . . f L (a, i, j) by the recursion relation: 

• Initialisation 

Ma, i, j) = q 0 (a)e(O\ \ a, i, j) 

• Recursion 

f„(a, i, j) = e(O n | a, i, j) £ q(a, i, j | /?, k, 0/„-iG8, k, I) for 2 < n < L, 

M<i) 

where the notation [k < 1} means that the sum over those indices is performed only through the set of unordered pairs, i.e. with the 
constraint k < I or < ;'. 

The backwards variable b\(a, i, j) ... b L (a, i, j) is defined analogously: 

• Initialisation 
b L (a, i, j) = 1 

• Recursion 

b„(P, k,l)= ^ e(O n+l | a, i, j) q(a, i, j \ yS, k, I) b n+ i(a, i, j) for 1 < n < L. 

a,{i<j\ 

Naively implemented, the transition complexity of the forward- and backward-recursion scales quadratic in the number of time 
segments, and it scales as M 4 with the number of haplotypes M. As we show in the following, this complexity can be reduced to the 
second power of M by exploiting symmetries in the transition matrix. 

■ Exploiting the transition symmetries for optimization 

We normally do not expect all ^ ^ j different coalescence rates (all pairs of indices ;') per time interval to be different. Instead, we 
normally consider the case in which our samples come in groups, forming subpopulations. 

More formally we define index sets /], I 2 .../„,, such that each pair index j is in exactly one index set. For example, if we consider 
M = 4 haplotypes from only one population we have one index set Ii = ((1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)) in which all pair 
indices are. If the haplotypes come from two different subpopulations, with the first two haplotypes from subpopulation 1 and the other 
two from subpopulation 2, we have three index sets: 

Ii = {(1, 2)} (first coalescence is within population 1) 

h = {(1, 3), (1,4), (2, 3), (2, 4) (first coalescence is across populations) 

h = {(3, 4)) (first coalescence is within population 2) 

We define the marginal index ipjj to yield the index set in which a given pair ;' resides. We set equal coalescence rates for all pairs 
(;', ./') within a given index set, and denote this coalescence rate for index set u by X u a . We then have 

X' a J = X u a with u = (fiij 

Because the coalescence rates are symmetric within each index set, so are the transition probabilities: 

q(a, i, j | fj, k, I) = q x {a, <pjj} S aJj <% S ]t + q 2 (a, <Pij \ 0, <p kJ ) 

We can exploit this fact to optimize the transition process in the forward- and backward-algorithm of the Hidden Markov Model. 
Without optimization, we had the forward recursion: 
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Ma, i, j) = e(0„ | a, i, j) ^ /„_i(/3, k, I) q(a, i, j; /3, k, I), 

which scales as M 4 (needs to be evaluated for all a, i, j, each of which has a sum over /3, k, I). 
To optimize, we define the following marginal forward variable: 

F„(a, u) = ^ f„(a, i, j) 

ijel. 

With this definition the recursion reads 



f n (a, i, j) = e(0„ | a, i, j) 



f n -i(a, i, j)q x {a, </?;.,) + Y/n-\(P, v)q 2 (a, ip Lj ; B, v) 

fl,v 



which scales as n,- 2 , where n ; is the number of index sets (which is rougly quadratic in the number of subpopulations) since no sum 
across pair tuples are involved. 

Similarly, for the unoptimized version of the backwards recursion we have 
e(O n+i | a, (', j) b n+l (a, i, j) q(a, i, j; /?, k, I). 

aj<j 

We define the marginal backward variable: 
B„(a, u) = b n(a, i, j)e(0„ | a, i, j). 

With this we get 

b„(P, k, I) = e(O n+[ | yS, k, l)b n+x {fi, k, l)q { (P, tp u ) + ^B n+l (a, u)q 2 (a, u; B, <p kJ ), 

■ Optimization for SNP-free regions 

We now consider the interval [m, n 2 ] with length n 2 -n l =l in which sites have no SNPs. We then have trivial emission probabilities 
of 

e(O n | a, i, j) = e Q (a) for n e [n\, n 2 ]. 

■ Forward Recursion 

We can then write the forward recursion through the entire region as 

f n (a, i, j) = £ g'(a, i, J I P, k, Df n -AP, k, I) 

/SMI 

with the matrix power g l , defined recursively: 
g\a, i, j | B, k, I) = q{a, i, j | B, k, I) e 0 (a) 

g\a, i, j | p, k, I) = e 0 (a) ^ g'~\y, m, n \ B, k, I) q(a, i, j \ y, m, n). 

We assume that we can write the propagation matrix in the form 

g\a, i, j | p, k, I) = 8 afi <5 a Sjj g[(a, tp t J\ + g l 2 (a, <p tJ | /?, <p kJ ), 

with g\(a, u) = qi(a, u) eo(a) and g\(a, u\ (S,v) = q 2 (a, u \ P, v) e 0 (a). 

We can hence write the recursion equation using this form and the similar form for the transition matrix: 

g\a, i, j | 8, k, I) 

= e Q (a) (S yfi 8 mk S nJ g'f\y, ip m „) + g 2 \y, fc, | /?, <p kt )) {S a , y S im 6 jfl qi{a, <p L j) + q 2 (a, ViJ \ y, ip mfl )) 

yjn<n 

= e 0 (a) (6 yfi 8 mk S nJ g[~ l (y, ip m „) 6 a , y 5 im 8 jjt q x {a, ip,j) + g' 2 l (y, <p m ,„ \ B, <p kl ) 6 ay 6 im 8 jfl qi{a, Wj ) + 

yjn <n 

Sy,{! 3m,k S nJ g'f\y, Vm„) q 2 {a, <p Lj I y, ipmfi) + gl\y, <Pm,n \ P, <Pk,l) <n{a, IfiiJ \ 7, Vmfl)) 
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( 



= e Q (a) 



<W <% S u , g\ '(a, ipij) qi(a, <Pij) + g\ \a, ip kJ \ P, ipij) qi(a, ip^) + 



g\ \P, fkj) n{a, fij | P, <P k j) + YjW 1 " II g 2 w I P< fki) 1i(a, Vij | 7, w) 



where ||/„ || is the number of pairs in the set /„. 
We read off the two recursion relations: 



g[(a, u) = e 0 (a)g\ \a, u)q x (a, u), 



and g l 2 (®, u I P> v) = eo(a) 



g l 2 \a, u\0, v)qi(a, u) + g\ \p, v)q 2 (a, u \ p, v) + ^||/w||g2 \y, w\P, v)q 2 (a, u\y,w) 



Given the form of the propagation matrix g l , we can make use of the above defined marginal forward vector F„(a): 
f n (a, i, j) = f n _,(a, i, j)g[(a, <Pij) + J]g 2 (a, fij \ P, v)F„_,(J3, v) 

■ Backward Recursion 

Similarly, for the backward recursion, we have 

b n (J3, k,l)=Yj h '( a > J\P,k,l) b n+l (a, i, j) 

aj<j 

with 

h\a, i, j | P, k, I) =g 1 = q(a, i, j | p, k, I) e 0 (a) 

h l (a, i, j \p,k,l)= ^ h'~ [ (a, i, j | y, m, n)q(y, m, n\P, k, l)e 0 (y) 

yjn<n 

We try to write h l in the form 

h\a, i, j | p, k, I) = 6 afi <5 a Sjj h[(a, ifij) + h' 2 (a, Vuj \ (3, <p kJ ), 

with h\(a, u) = qi(a, u) e 0 (a) and h\(a, u\ p,v) = q 2 (a, u \ P, v) e 0 (a). 

For the recursion it then follows: 

h\a, i, j\P,k,l) 

= Yj e 0 (y) {S a , r S im S M h\- l (a, <pij) + h' 2 [ (a, <p Lj \ y, ip mfl )) [8 yfi 6 mJl 8 nJ qi(y, ip mA ) + q 2 (y, ip mfl \ P, ip ki )) 

yjn<n 

= Yj e 0 (y) {S a ,y S im 6 M h\- [ (a, tp,^ 8 yJ3 8 mJl 8 ni q x (y, <p m/l ) + h l 2 \a, ip u | y, <p mfl ) 8 yfi 8 mJl S nJ qi(y, ip mA ) + 

yjn<n 

S a , r 8 im S M h\- l {a, <pij) q 2 (y, ip mn \ p, <p kJ ) + h' 2 l (a, <p Lj \ y, ip mfl ) q 2 (y, ip mfl \ /?, tp k) )) 
= S afi S Kk Sjj h\- l (a, tpijj qi(a, tpijj e 0 (a) + h' 2 [ (a, ViJ \ p, ip kJ ) qi (/3, ip kJ ) e 0 (J3) + 

e 0 (a, <fiij) h l [\a, ip t J) q 2 {a, ip uj \ p, <p kJ ) + Yj\l w ||^ 1 ( Q '' Vij \ 7- *f) l2(y, w \ P, <p kJ ) e 0 (y), 

y,w 

from which we read off the two recursion relations: 

h\(a, u) = e 0 (a) h l [ l (a, u) qi(a, u), and h' 2 (a, u \ P, v) = 

h' 2 \a, u | P, v)qi(ft, v)e 0 (J5) + e 0 (a) h[-\a, u)q 2 (a, u\p,v) + \\h' 2 \a, u \ y, w)q 2 (y, w \ p, v) e 0 (y) 

We can again use a marginalized version of the backward variable for speed up: 
b n (P, k, I) = h[(p, <p k) ) b„JJ3, k, I) + J]h l 2 (a, u | p, <p kJ ) B n+l (a, u) 

with 
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B„{a, u) = ^ fo„(a, fc, /), 

which is different from B„(a, u), as it does not use the emission probability. 

■ Missing data 

For contiguous segments of missing data, we can apply the same recipe as for homozygous regions, simply replacing the emission 
probability £0(0-, j) with 1 for all states. All the matrix powers as described above can then be computed with the same recursion 
formulas. 

5. Parameter Re-estimation 

The objective function in the Baum- Welch algorithm is defined as 

F(e, e) = J] 1o s(<?K J I P> fe > l > 6 1) s («' ! '. J I A fe . »; o, 0). 

a,£,(<</UM) 

with 

L 

S(a, i, j I yS, t, /; O, 6) = £&(a, i, j I A *, /; O, 6) 
and 

£,(«, i, j I /?, fc, I; O, 8) = f„(J3, k, I) q(a, i, j | 0, k, I; 8) e(O n+1 \ a, i, j) b n+] (a, i, j) 

Here, 8 denotes the set of parameters to be learned, which are all coalescence rates Aa and the recombination rate p. 
The re-estimated parameters are obtained by optimizing over 8: 

8' =argmaxF((9, 0). 

9 

We can write the objective function as 

F(6, B)= Yj l0 §('?i( ff ; 5) <W <% Sji + q 2 (a I # 8}) E(a, i, j \ 0, k, I; O, 0). 

a,0,{i<j),{k<l) 

We now separate out the diagonal part of the sum: 
F(6, 6) = ^log(<7i(or; G) + q 2 (a \ a; 8)) H diag (a; O, 6) + 

a 

a.p 

and define the two separate sums: 

H diag (a \0,8)= ^S(a, i, j | a, i, j; O, 8) 

U<j) 

and 

S off (a\P;0,8)= a(a ' h j I A k, I; O, 8) (l - <5 0>j8 5 iJc 5 U ). 

{i<Mk<D 

Now the objective function reads: 

F(8, 8) = Y}°sM a ' °1 + «(« I «: «)) 3 d iag (a I O, fl) + £log(«i(a I # A) 3 oft(* I # O, 8) 

The advantage with this form is that H off can be efficiently computed using the marginal forward- and backward- variables from above. 
We first write 

L-l L-l 

Bofl(« I P) = Y&fc I # and S "ia g (*) = 

n=l n=l 

with 
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U<Mk<[) 

I i, j) b„+i(a, i, j) (l - <5 aj8 S iJc Sjj) 

U<Mk<l) 

= F„(P) q 2 (a | /?) B„(a) - 8 afi ^/„(a, i, j) q 2 (a | a) e(0„ +[ | a, j) b„ +l (a, i, j) 

(<</) 

and 

(<■</),(*</) 

= ^]/n(a. i) (9i( a ) + ?2(ff I a)) e(0„+i | a, i, j) b n+l (a, i, j) 

(!</) 

■ Constrained optimization 

The Baura- Welch algorithm in principle lets us re-estimate all coalescence rates and recombination rates. However, in practice we 
constrain the number of free parameters: 

Parameter Constraints: 

• For fixed a and u, all coalescence rates \'a with ipjj = u need to be equal, where </>; , denotes the index sub-group as defined in the 
beginning of chapter 4. 

• Parameters for neighbouring time intervals may be constrained to be equal, according to a defined pattern. For human data and one 
single population, we usually use 40 time intervals, with quantile boundaries as in equation 6, and the most recent 10 intervals left 
as freel parameters, and the other 30 intervals joined to pairs of 2, leading to 25 free coalescence rate parameters. For samples 
coming from two subpopulations, we usually use 30 time intervals, with the first 8 being independent, and the latter 22 being 
joined to pairs of two, leading to 19 free parameters within each population and 19 additional free parameters for the coalescence 
rate across populations. 

• All coalescence rates across populations must be smaller or equal to the mean rate within the two populations, such that the 
relative cross coalescence rate in interval a , y a = 2 A^ 2 / (A" + A 22 ) remains between 0 and 1 . 

• All coalescence rates must be positive. 

• The recombination rate must be positive. 

These constraints can be incorporated into numerical optimization techniques using logarithmic and tangential variable transforms (see 
[4]). 

6. Estimating the singleton branch length 

The local singleton branch length of the tree can be estimated approximately from the data. For that we build an HMM inspired from 
PSMC or PSMC (MSMC with 2 haplotypes), with observations being a sequence of "1", "0" and Here, "1" denotes a position with 
minor allele count 1 (a singleton), "0" denotes any other called position and "." denotes missing data. The hidden state of this HMM is 
the singleton branch length T s . It is straight forward to see that the emission probability of that model is: 

e(l | T s ) = l-/i T s , e(0 | T s ) = n T s , e{. \ T s ) = 1. 

This is very similar to the PSMC/PSMC emission probability, with the only difference being a missing factor 2 in front of T s , see 
below. 

It is difficult to derive an analytically exact transition probability for that model, since the recombination process that changes one tree 
to another does not only depend on the singleton branch length. However, we find that simple heuristics are good enough: We know 
that most recombination events that change the singleton branch length occur within the singleton branch length itself, which happens 
with probability p T s . For the probability conditional on there being a recombination event, we now simply approximate the process by 
the analogous expression from PSMC, equations 1, 2 and 3, using a constant population size (to avoid having to re-estimate T s every 
iteration of MSMC) and replacing / = TJ2. The factor 1/2 is important because PSMC considers half the total branch length as its 
hidden state, whereas here we use the branch length T s itself as hidden state. 

We again discretize the state space similarly to MSMC using quantile boundaries: 
r,, = -log|l-— j/(2+l/(M-l)) 

where 2+1 /(M - 1) is the expected singleton branch length: It is the sum of the leaf branch length of the tree (which is 2 in units of 
2 A'o) and that part of the tree which gives rise to variants with derived allele frequency M - 1 (which is 1 / (M - 1) in units of 2 2Vo) ■ 

We then run this HMM via the forward- backward-algorithm to obtain local posterior probabilities. For the discretization intervals, we 
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We then use the maximum posterior path as best local estimate of T s . The reason why this simple approach works so well despite all the 
approximations is the fact that the maximum likelihood path of T s is very much dominated by the local emission rates, and not so much 
by the transition rates. 

■ Testing the singleton branch length estimation with simulations 

To test how the approximations in our singleton branch length estimation influence the demographic inference, we can use the true 
singleton branch length locally directly to use for T s in the MSMC emission probability. As shown in Supplementary Figure 3, this 
comparison reveals that both the HMM estimation of T s and the true value obtained from the actual trees from the simulation are 
remarkably similar, revealing almost no effect of either the approximation involved in using PSMC's transition probability, nor in using 
a constant effective population size for the estimation. 

There is one important note about comparing with the true singleton branch length from simulations. It is quite tedious to compute from 
the Newick-trees (output from MaCS [6] and ms [5] coalescent simulators) the full minor allele singleton branch length. As defined 
above, T s does not only include true singletons, but also mutations with derived allele count M - 1. In these comparisons, we therefore 

change the definition slightly to T s , which includes only singletons with derived allele count 1. This is very easy to extract from the 
Newick-tree format, since it is simply the sum of all leaf branches. Correspondingly, in the MSMC emission probability we have to use 

a slightly different term for category 2: e(0 \ t, i, j) = p.{f s - 2 f)/(M - 2), which is straight forward to see, given the slightly simpler 

definition of T s . We use the modified definition T s only for the MSMC runs using the true singleton branch lengths (dotted lines in 

Supplementary Figures 3). 

7. Coalescent simulations 

■ Zig-Zag simulation 

We first generated a single population with a zig-zag type population history, implemented in ms [5] as a sequence of exponential 
growths and declines: 

ms 4 1 -t 7156.0000000 -r 2000.0000 10000000 -eN 0 5 -eG 0.000582262 1318.18 -eG 
0.00232905 -329.546 -eG 0.00931619 82.3865 -eG 0.0372648 -20.5966 -eG 0.149059 5.14916 -eN 
0.596236 0.5 -T 



Here, the first parameter (4) is the number of haplotypes, which we varied among 2, 4 and 8. 

■ Split simulation 

In the second scenario we simulated a single population with MaCS [6] that split into two constant size populations. The command line 
for this case is: 

macs 4 30000000 -t 0.0007156 -r 0.0002 -12 2 2 -ej 0.116 2 1 -T 



We again generated a dataset with 8 haplotypes for this scenario, replacing the first command line argument with "8", and the three 
parameters after "-I" with "2 4 4". Also, the parameter following "-ej" determines the scaled time of the split, where 0.116 corresponds 
a split lOOkya using our generation time and mutation rate. We simulated various values for the split time between lOkya and 150kya, 
simply scaling the parameter after "-ej". 

Since MaCS can simulate faster and more efficiently, we simulate 30Mb per simulation. With ms, we only simulate 10Mb. Note that ms 
is needed to simulate negative growths, which MaCS currently cannot do. 

■ Simulations with sharp population size changes 

We simulated two histories with sharp population size changes. One which mimics the true changes observed in CEU (see Supplemen- 
tary Figure 2a): 

macs 4 30000000 -t 0.0007156 -r 0.0002 -eN 0.0 10.8300726663 -eN 0.00116452394261 
1.08300726663 -eN 0.0174678591392 0.21660145332 6 -eN 0.0465809577045 1.08300726663 -eN 
0.0873392956959 3.24902179989 -eN 0.232 904788522 1.08300726663 -T 



and one which mimics the history observed in YRI (see Supplementary Figure 2b): 



macs 4 30000000 -t 0.001 -r 0.0004 -eN 0.0 8.25 -eN 0.0025 0.825 -eN 0.0416666666667 2.475 
-eN 0.166666666667 0.825 -T 



Again, we replaced the first parameter "4" with "2" and "8" to simulate fewer or more haplotypes. 
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■ Population split with migration 

This command line simulates a population split lOOkya with subsequent migration of 0.0002 per generation until 50kya, as seen in 
Supplementary Figure 2c. 

ms 4 10000000 -t 10000 -r 4000 -12 2 2 -ej 0.116 2 1 -eM 0.058 16 -T 

■ Population split with population size changes 

This simulates a population that split into two populations 120kya with varying population size changes resembling the CEU/YRI split 
without any migration, as seen in Supplementary Figure 3d. 

macs 4 30000000 -t 0.001 -r 0.0004 -12 2 2 -eN 0 9.5 -en 0.000833333333333 1 0.95 -en 
0.0025 2 0.95 -en 0.0125 1 0.19 -en 0.0333333333333 1 0.95 -en 0.0416666666667 2 2.85 -ej 
0.05 2 1 -eN 0.166666666667 0.95 -T 



■ Hotspot simulations 

We used the software msHot [7] to simulate sequences under the true recombination map in Humans, obtained from the HapMap 
project. We simulated 100 chromosomes, each with 1Mb long sequences for 4 haplotypes. We chose the same command line as for the 
zig-zag simulation, with a scaled recombination rate of 4 No r = 0.00055, a scaled mutation rate of 4 N 0 p = 0.0007156 and additional 
parameters 



-v n <start_l> <end_l> <1_1> <start_2> <end_2> <1_2> ... <start_n> <end_n> <l_n>, 



where each block of recombination rates is taken from a random chunk of the true human recombination map. The <l_i> values are 
given as multiples of the average recombination rate in humans of 1 cM/Mb. 



8. Comparison with diCal 

We applied diCal [8] to 10Mb of simulated sequence from 8 haplotypes. This sequence length was used in [8] to demonstrate the 
method, it is currently not practical to run it on longer data sets. We used the following command line to run diCal: 



java -Xmx65G -d64 -jar diCal-vl . 2/diCal . jar -F "chrl . f asta, chr2 . f asta, . . . , chr 10 . f asta" -c 
10 -I params.txt -n 8 -p "3 2 2 2 2 2 2 2 3" -t 1 



where "chrl .fasta", "chr2.fasta",... denote 10 fasta files, each with 8 haplotypes of length 1Mb. We also tried "-t 0.5" and "-t 0.1" (see 
Supplementary Figure 10). 



9. Processing Genomic Data 

In contrast to PSMC [1], we do not need to bin all data into bins of lOObp but can simply go through every basepair in the genome. As 
described above, we optimized MSMC for efficient traversal of SNP-free homozygous segments, as well as missing data segments. 

In practice, the MSMC implementation takes as input files simple tab-separated files, one for each chromosome, such as this: 
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CCCA 
AAAG 

CCCT,CCTC 



Here, the columns are: 

1. Chromosome 

2. Position of the SNP in the chromosome 

3. Number of called bases since the last SNP, including the site at this position. 



4. The alleles at the site in question. Comma-separated variations can account for missing phasing information, as in seen in the line 
before the last line in the example above. In practice, we simply average the emission probability at that site over all variations. 
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We account for missing data in this format through the third column. This number is always larger than zero since it always includes 
the site at that position. However, if the number of called bases is smaller than the physical distance since the last SNP, the rest is 
assumed to be missing. This encoding of missing data neglects information about the exact positions of missing bases, but simply gives 
the total number of them since the last called site. In practice, we assume that all missing bases since the last called base form a 
contiguous block right after the last called base. This way we only need to apply the recursion in the forward- and backward vector 
twice: once for the block of missing data and once for the block of homozygous called bases. The error in this approximation should be 
negligible, since the forward- and backward variables will typically change on much larger genomic lengthscales than between 
individual SNPs, such that the exact distribution of missing bases between two SNPs should not affect the inference. 

In our implementation on github, we provide scripts that facilitate the generation and processing of these input files from e.g. BAM 
files or Complete Genomics VCF files. We also provide additional documentation about the software as a README on github. 

10. Relationship to Island-Migration models 

In deriving the transition- and equilibrium probabilities for times of first coalescences we have parameterized our model using effective 
coalescence rates between all pairs of lineages, X'-'(t). This parameterization is very general and allows us to model a) scenarios in 
which all haplotypes are sampled from one population, X'''(t) = X(t), with no differences between rates between different lineages; and 
b) scenarios with haplotypes sampled from different subpopulations. In the latter case, it is interesting to ask how our parameterization 
relates to a model with two separated populations and time-dependent migration between them. 

Consider the case of two populations with time-dependent population sizes Ni(t) and N 2 (t), and a time-dependent and symmetric 
migration rate m(t) between them. We believe that such a parameterization is equivalent to a parameterization using coalescence rates 
within populations, A. n (f) and X 22 (t), and a cross coalescence rate X l2 {t), where the indices here mean populations, not lineages. We 
mean by "equivalent" that for every two-island model with symmetric migration rates, parameterized by {Ni(t), N 2 (t), m(t)), there 
exists a mapping 

M(f), N 2 (t), m(t)\ -» {X u (f), X n {t), X 22 (t)} 

such that the distribution of first coalescence times, q 0 (t), is the same in both models. 

We can indeed show this for the special case of two haplotypes drawn from different subpopulations, as shown below. We further 
hypothesize that the equivalence holds for more general island-migration models, too, but we emphasize that we cannot prove this. 

It is important to highlight the fact that what we term "relative cross-coalescence rate", defined as y(t) = 2 A. 12 (f)/(A. n (f) + X 22 (t)) in the 
article, is not equivalent to the migration rate. The exact relationship is more subtle, even for the special case of two haplotypes (see 
below) and in principle would involve numerical solutions to differential equations. However, intuitively, it may help to imagine the 
migration rate m(t) as being closely related to the rate of change of the relative cross-coalescence rate y(t): Consider a situation in 
which two populations have been separated since time T, but that there was a brief period of relatively strong migration between them 
around time T m < T, where times are counted backwards in time. The expected relative cross coalescence rate y(t) would then be: 

( 0 ioxt<T m 
y(t) = \ 0<rm<l forT n <t<T 
{ 1 for?> T, 

i.e. a step-like function, in which the brief pulse of migration would increase y(t) to some intermediate value y m , before it would reach 1 
at the split time T. Clearly, if migration is more ongoing rather then short and strong, y(t) would increase more steadily, in a well 
defined way which is however difficult to derive analytically. 

■ Constant migration rate, two haplotypes 

Consider two haplotypes, each of which drawn from two separate populations. Let us first consider constant parameters Nfo) = Nj and a 
constant migration rate m(t) = m. As shown in [9] the distribution of first coalescence times q 0 (f) can in that case be computed from the 
matrix exponential of a Markov rate matrix. The five states of the Markov process are then: S\\, both genes are in population 1; S 22 , 
both genes are in population 2; S12, one gene is in population 1 and the other is in population 2; Si , the genes have coalesced and the 
single gene is in population 1; and S2, the genes have coalesced and the single gene is in population 2. It is straight forward to write 
down the 5 x5 rate matrix, Q, for this process using the three parameters N { , N 2 and m, as shown in the referenced article. The probabil- 
ity density of first coalescence times can then be expressed as the probability that starting with state S 12 the system will eventually be in 
one of the absorbing states Si or S 2 , which can be expressed by a matrix exponential: 

P S: (t) = l S xp(Qt)] ShSll 

where the indices denote the end and the start state. The probability density of coalescent times, g 0 floboith( f ) is tnen 
1 1 

<7o,HoboIth(0 = PsM) + Pfc(f) 

2Ni 2N 2 

We can now very easily show that a single time-dependent parameter X c (t), which denotes the effective rate of cross-coalescence 
between the two lineages, is sufficent to parameterize this probability density. As defined in chapter 1, the equilibrium probability of 
first coalescence times in this special case can be expressed as 
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jfV(j')df'). 



q 0 (t) = X c (t) expj 
We can rewrite this as 
dUf) 

*>(*) = exp(-(i(0-i(0))) 

dt 

with L(J) being defined as the anti-derivative dL(t)/dt = X c (t). It is then clear that the function L(t) which solves the differential 
equation 

dUt) 

exp(-(L(r) - L(0))) = ? 0 floboith(0 

dt 

leads directly to the desired solution for X c (t). This shows that for this special case, both parameterizations are equivalent. 

■ Time dependent migration rate 

If population sizes and migration rates are time-dependent, the approach from [9] needs to be modified. In that case, the Markov 
transition matrix is itself time-dependent, expressed as Q(t), and the formula for the equilibrium density can still be expressed as a 
matrix exponential, but with an additional integral: 

p s ,(t) = [^(fandt)] s ^ 

which still can in principle be parameterized via a single rate X c (t) as shown above. 

■ More general cases 

In the more general case, if more than one haplotype is sampled from either or both of the two populations, if more than two popula- 
tions are present, or if recombination comes into play, we still hypothesize that effective coalescence rates within and between popula- 
tions are sufficient to model any island-migration model with symmetric migration rates. But as long as we lack a formal proof of this, 
we acknowledge that it is not entirely clear whether our parameterization is exactly mappable to an island model or whether it only 
approximates it in the most general case. 

11. Normalization Proofs 

■ Helpers 

We define the antiderivative (/>'•> (t) with — 0 ij (r) = \ iJ (t). Also we define <I>'(f) = £!f = , <p'' j (t) such that — <t>'(i) = A'(t), and similarly 

dt ■' dt 

*(') = L<jf J (f) such that j®^) = A W- 

We then have 



L m (h ■ t 2 ) = exp(- J^ 2 A m (v) d vj = « 
and 

£ 2 A(v)dvj = , 



(10) 

'*( S ) 



L(t i ; h) = exp 
We then have: 

A(«) Us; u)du= *' («) du = e* ( ' 5) e~ z dz = e m { e - m - t^**") = 1 - Us; t) 

and similarly 

u)L m (s;u)du = \- L m (s;t) (11) 

■ Normalization of the equilibrium probability 

The equilbrium probability is 

q(i,j,t) = X i -j(t)U0;t). 

We check normalization by integrating 



jf A» 
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qU,j,t)dt = J J\ iJ (t)L(0;t)dt = J A(t) L(0; t) d t = 1 - L(0; oo) = 1 



using equation 10. 

■ Normalization of the conditional transition probability 

In a bit more formal style, the total transition probability can be expressed using Kronecker-Deltas and the Heavyside-function 0. We 
have 



q(i, j, t\k, I, s, u, m) = 

I C \ { (8 ni + 5 mJ ) A'-'(f) L m (u; t) 0(f - u) for / < s 

8(t - s) Si k 8 u \ \ m - m (K) L m (u; k) dK + (l - 6 mi )(l - 8 nl ] L m (u; t)\+{) . . 

' h \X 1 * n ■" ) \{8 mJ[ + 8 m) )X'-'(t)L'"(u;s)Us;t) for t>s. 

Now the normalization needs to hold for the total parameter space of the triple (;', j, t): 

M p» 

y I qii, j, 1 1 k, I, s, u, m) dt 

Yu^ 5(t - ') S i* 5 i)[ L "'( u ' *) dK + {\- S mJc ) (1 - S mJ ) L m (u- ?)) dt 

M _ 

Yj J (<V< + s mj) x ' J (t) L m (u; t) 0(f - u) dt 

5JP 



M 



M 

+ T, | {S n jc + S ml )X' ! [i)L' , (,r.s)Us:i)di = 
•<j' ' 



We compute all three terms separately: 
First Term: 



g jTW - s) <5 a Sjj[f u *- mj " W *) <f * + (l - <W) (1 - <W) i m ("; o) 

= Pa" ,j » L m (u; K)dK + {\- S mJl ) (1 - <?„,,) L m (u; s) 

Ju 



Second Term: 



M 

V (<V< + <5 mJ ) A'>>(f) L m (u; f) ®(t - u) dt 

M 

= y (5 m j + 5 m ,,)\ iJ (t)L m (u;t)dt 
><J Ju 

= Yj f X mJ (t)L m (u-t)dt 

\h m (t) - X mm (t)) L m (u;t) dt 



Third Term: 



J f" (<W + M A' J '(f) s) L(s; t) dt 
i<i Js 

= (<W + s mi) Lm ("; f° A (t) Us; t) dt 
The sum of all terms: 

f A m -"V) L"\u; K)dK + {\- 6 mi ) (1 - 6 mi ) L m (u; s) 

Ju 

+ P (A'"(f) - X mfn (t))L m ( U ; t)dt 

Ju 



(12) 
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+(<W + 5 mJ ) L m (u; s) £° A(r) Us; t) dt 
-■{l-S mJc ){l-S mJ )L m (u;s)+ f A m (t)L m (u;t)dt + (6 mJc + S m j)L m (u; S ) P A(f) L(s; t) dt 

Ju Js 

■ (i - <W) (i - 8 m ,i) *) + a- £ m («; *)) + (<W + ^ m ("; *) a - °°» 

■ L m (u;s) + (\ -L n {u\s)) 
■■ 1 



References 

1 . Li , H . and R . Durbin , Inference of human population history from individual whole-genome sequences . Nature , 20 1 1 . 

2. Marjoram, P. and J.D. Wall, Fast "coalescent" simulation. BMC genetics, 2006. 7: p. 16. 

3. McVean, G.A.T. and N.J. Cardin, Approximating the coalescent with recombination. Philosophical transactions of the Royal Society 

of London Series B, Biological sciences, 2005. 360(1459): p. 1387-1393. 

4. Press, W.H., Numerical recipes : the art of scientific computing. 3rd ed2007, Cambridge, UK ; New York: Cambridge University 

Press, xxi, 1235 p. 

5. Hudson, R. R. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics, 2002, 18: 337-338. 

6. Chen, G.K., P. Marjoram, and J.D. Wall, Fast and flexible simulation ofDNA sequence data. Genome Res, 2009. 19(1): p. 136-142. 

7. Hellenthal, G. and M. Stephens, msHOT: modifying Hudson's ms simulator to incorporate crossover and gene conversion hotspots, 

2006, Bioinformatics. 

8. Sheehan, S., Harris, K., & Song, Y. S. (2013). Estimating variable effective population sizes from multiple genomes: a sequentially 

markov conditional sampling distribution approach. Genetics, 194(3), 647-662. doi: 10. 1534/genetics.l 12.149096 

9. Hobolth, A., L. N0rvang Andersen and Thomas Mailund, On Computing the Coalescence Time Density in an Isolation-With- 

Migration Model With Few Samples, Genetics, 2011. 187: p. 1241-1243 



